63 research outputs found

    Rapid Etiological Classification of Meningitis by NMR Spectroscopy Based on Metabolite Profiles and Host Response

    Get PDF
    Bacterial meningitis is an acute disease with high mortality that is reduced by early treatment. Identification of the causative microorganism by culture is sensitive but slow. Large volumes of cerebrospinal fluid (CSF) are required to maximise sensitivity and establish a provisional diagnosis. We have utilised nuclear magnetic resonance (NMR) spectroscopy to rapidly characterise the biochemical profile of CSF from normal rats and animals with pneumococcal or cryptococcal meningitis. Use of a miniaturised capillary NMR system overcame limitations caused by small CSF volumes and low metabolite concentrations. The analysis of the complex NMR spectroscopic data by a supervised statistical classification strategy included major, minor and unidentified metabolites. Reproducible spectral profiles were generated within less than three minutes, and revealed differences in the relative amounts of glucose, lactate, citrate, amino acid residues, acetate and polyols in the three groups. Contributions from microbial metabolism and inflammatory cells were evident. The computerised statistical classification strategy is based on both major metabolites and minor, partially unidentified metabolites. This data analysis proved highly specific for diagnosis (100% specificity in the final validation set), provided those with visible blood contamination were excluded from analysis; 6-8% of samples were classified as indeterminate. This proof of principle study suggests that a rapid etiologic diagnosis of meningitis is possible without prior culture. The method can be fully automated and avoids delays due to processing and selective identification of specific pathogens that are inherent in DNA-based techniques

    GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.</p> <p>Results</p> <p>We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from <url>http://genesrf2.bioinfo.cnio.es</url>. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.</p> <p>Conclusion</p> <p>varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.</p

    A new molecular breast cancer subclass defined from a large scale real-time quantitative RT-PCR study

    Get PDF
    BACKGROUND: Current histo-pathological prognostic factors are not very helpful in predicting the clinical outcome of breast cancer due to the disease's heterogeneity. Molecular profiling using a large panel of genes could help to classify breast tumours and to define signatures which are predictive of their clinical behaviour. METHODS: To this aim, quantitative RT-PCR amplification was used to study the RNA expression levels of 47 genes in 199 primary breast tumours and 6 normal breast tissues. Genes were selected on the basis of their potential implication in hormonal sensitivity of breast tumours. Normalized RT-PCR data were analysed in an unsupervised manner by pairwise hierarchical clustering, and the statistical relevance of the defined subclasses was assessed by Chi2 analysis. The robustness of the selected subgroups was evaluated by classifying an external and independent set of tumours using these Chi2-defined molecular signatures. RESULTS: Hierarchical clustering of gene expression data allowed us to define a series of tumour subgroups that were either reminiscent of previously reported classifications, or represented putative new subtypes. The Chi2 analysis of these subgroups allowed us to define specific molecular signatures for some of them whose reliability was further demonstrated by using the validation data set. A new breast cancer subclass, called subgroup 7, that we defined in that way, was particularly interesting as it gathered tumours with specific bioclinical features including a low rate of recurrence during a 5 year follow-up. CONCLUSION: The analysis of the expression of 47 genes in 199 primary breast tumours allowed classifying them into a series of molecular subgroups. The subgroup 7, which has been highlighted by our study, was remarkable as it gathered tumours with specific bioclinical features including a low rate of recurrence. Although this finding should be confirmed by using a larger tumour cohort, it suggests that gene expression profiling using a minimal set of genes may allow the discovery of new subclasses of breast cancer that are characterized by specific molecular signatures and exhibit specific bioclinical features

    Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.</p> <p>Results</p> <p>The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential.</p> <p>Conclusion</p> <p>The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts.</p

    Analysis and Computational Dissection of Molecular Signature Multiplicity

    Get PDF
    Molecular signatures are computational or mathematical models created to diagnose disease and other phenotypes and to predict clinical outcomes and response to treatment. It is widely recognized that molecular signatures constitute one of the most important translational and basic science developments enabled by recent high-throughput molecular assays. A perplexing phenomenon that characterizes high-throughput data analysis is the ubiquitous multiplicity of molecular signatures. Multiplicity is a special form of data analysis instability in which different analysis methods used on the same data, or different samples from the same population lead to different but apparently maximally predictive signatures. This phenomenon has far-reaching implications for biological discovery and development of next generation patient diagnostics and personalized treatments. Currently the causes and interpretation of signature multiplicity are unknown, and several, often contradictory, conjectures have been made to explain it. We present a formal characterization of signature multiplicity and a new efficient algorithm that offers theoretical guarantees for extracting the set of maximally predictive and non-redundant signatures independent of distribution. The new algorithm identifies exactly the set of optimal signatures in controlled experiments and yields signatures with significantly better predictivity and reproducibility than previous algorithms in human microarray gene expression datasets. Our results shed light on the causes of signature multiplicity, provide computational tools for studying it empirically and introduce a framework for in silico bioequivalence of this important new class of diagnostic and personalized medicine modalities

    Discriminating lymphomas and reactive lymphadenopathy in lymph node biopsies by gene expression profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Diagnostic accuracy of lymphoma, a heterogeneous cancer, is essential for patient management. Several ancillary tests including immunophenotyping, and sometimes cytogenetics and PCR are required to aid histological diagnosis. In this proof of principle study, gene expression microarray was evaluated as a single platform test in the differential diagnosis of common lymphoma subtypes and reactive lymphadenopathy (RL) in lymph node biopsies.</p> <p>Methods</p> <p>116 lymph node biopsies diagnosed as RL, classical Hodgkin lymphoma (cHL), diffuse large B cell lymphoma (DLBCL) or follicular lymphoma (FL) were assayed by mRNA microarray. Three supervised classification strategies (global multi-class, local binary-class and global binary-class classifications) using diagonal linear discriminant analysis was performed on training sets of array data and the classification error rates calculated by leave one out cross-validation. The independent error rate was then evaluated by testing the identified gene classifiers on an independent (test) set of array data.</p> <p>Results</p> <p>The binary classifications provided prediction accuracies, between a subtype of interest and the remaining samples, of 88.5%, 82.8%, 82.8% and 80.0% for FL, cHL, DLBCL, and RL respectively. Identified gene classifiers include LIM domain only-2 (<it>LMO2</it>), Chemokine (C-C motif) ligand 22 (<it>CCL22</it>) and Cyclin-dependent kinase inhibitor-3 (<it>CDK3</it>) specifically for FL, cHL and DLBCL subtypes respectively.</p> <p>Conclusions</p> <p>This study highlights the ability of gene expression profiling to distinguish lymphoma from reactive conditions and classify the major subtypes of lymphoma in a diagnostic setting. A cost-effective single platform "mini-chip" assay could, in principle, be developed to aid the quick diagnosis of lymph node biopsies with the potential to incorporate other pathological entities into such an assay.</p

    A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data

    Get PDF
    Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences. <br /

    Expanding the Understanding of Biases in Development of Clinical-Grade Molecular Signatures: A Case Study in Acute Respiratory Viral Infections

    Get PDF
    The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution

    Genetic Networks Controlling Structural Outcome of Glucosinolate Activation across Development

    Get PDF
    Most phenotypic variation present in natural populations is under polygenic control, largely determined by genetic variation at quantitative trait loci (QTLs). These genetic loci frequently interact with the environment, development, and each other, yet the importance of these interactions on the underlying genetic architecture of quantitative traits is not well characterized. To better study how epistasis and development may influence quantitative traits, we studied genetic variation in Arabidopsis glucosinolate activation using the moderately sized Bayreuth×Shahdara recombinant inbred population, in terms of number of lines. We identified QTLs for glucosinolate activation at three different developmental stages. Numerous QTLs showed developmental dependency, as well as a large epistatic network, centered on the previously cloned large-effect glucosinolate activation QTL, ESP. Analysis of Heterogeneous Inbred Families validated seven loci and all of the QTL×DPG (days post-germination) interactions tested, but was complicated by the extensive epistasis. A comparison of transcript accumulation data within 211 of these RILs showed an extensive overlap of gene expression QTLs for structural specifiers and their homologs with the identified glucosinolate activation loci. Finally, we were able to show that two of the QTLs are the result of whole-genome duplications of a glucosinolate activation gene cluster. These data reveal complex age-dependent regulation of structural outcomes and suggest that transcriptional regulation is associated with a significant portion of the underlying ontogenic variation and epistatic interactions in glucosinolate activation

    Haptoglobin phenotype is not a predictor of recurrence free survival in high-risk primary breast cancer patients

    Get PDF
    Contains fulltext : 70104tjan-heijnen.pdf (publisher's version ) (Open Access)BACKGROUND: Better breast cancer prognostication may improve selection of patients for adjuvant therapy. We conducted a retrospective follow-up study in which we investigated sera of high-risk primary breast cancer patients, to search for proteins predictive of recurrence free survival. METHODS: Two sample sets of high-risk primary breast cancer patients participating in a randomised national trial investigating the effectiveness of high-dose chemotherapy were analysed. Sera in set I (n = 63) were analysed by surface enhanced laser desorption ionisation time-of-flight mass spectrometry (SELDI-TOF MS) for biomarker finding. Initial results were validated by analysis of sample set II (n = 371), using one-dimensional gel-electrophoresis. RESULTS: In sample set I, the expression of a peak at mass-to-charge ratio 9198 (relative intensity 20), identified as haptoglobin (Hp) alpha-1 chain, was strongly associated with recurrence free survival (global Log-rank test; p = 0.0014). Haptoglobin is present in three distinct phenotypes (Hp 1-1, Hp 2-1, and Hp 2-2), of which only individuals with phenotype Hp 1-1 or Hp 2-1 express the haptoglobin alpha-1 chain. As the expression of the haptoglobin alpha-1 chain, determined by SELDI-TOF MS, corresponds to the phenotype, initial results were validated by haptoglobin phenotyping of the independent sample set II by native one-dimensional gel-electrophoresis. With the Hp 1-1 phenotype as the reference category, the univariate hazard ratio for recurrence was 0.87 (95% CI: 0.56 - 1.34, p = 0.5221) and 1.03 (95% CI: 0.65 - 1.64, p = 0.8966) for the Hp 2-1 and Hp 2-2 phenotypes, respectively, in sample set II. CONCLUSION: In contrast to our initial results, the haptoglobin phenotype was not identified as a predictor of recurrence free survival in high-risk primary breast cancer in our validation set. Our initial observation in the discovery set was probably the result of a type I error (i.e. false positive). This study illustrates the importance of validation in obtaining the true clinical applicability of a potential biomarker
    • …
    corecore